Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores
نویسندگان
چکیده
Log-Structured-Merge (LSM) Tree gains much attention recently because of its superior performance in write-intensive workloads. LSM Tree uses an append-only structure in memory to achieve low write latency; at memory capacity, in-memory data are flushed to other storage media (e.g. disk). Consequently, read access is slower comparing to write. These specific features of LSM, including no in-place update and asymmetric read/write performance raise unique challenges in index maintenance for LSM. The structural difference between LSM and B-Tree also prevents mature B-Tree based approaches from being directly applied. To address the issues of index maintenance for LSM, we propose Diff-Index to support a spectrum of index maintenance schemes to suit different objectives in index consistency and performance. The schemes consist of sync-full, sync-insert, async-simple and async-session. Experiments on our HBase implementation quantitatively demonstrate that Diff-Index offers various performance/consistency balance and satisfactory scalability while avoiding global coordination. Syncinsert and async-simple can reduce 60%-80% of the overall index update latency when compared to the baseline syncfull ; async-simple can achieve superior index update performance with an acceptable inconsistency. Diff-Index exploits LSM features such as versioning and the flush-compact process to achieve goals of concurrency control and failure ∗Work done while author was at IBM Almaden Research Center. †Work done while author was an intern at IBM T. J. Watson Research Center. (c) 2014, Copyright is with the authors. Published in Proc. EDBT on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. recovery with low complexity and overhead. Diff-Index is included in IBM InfoSphere BigInsights, an IBM big data offering.
منابع مشابه
Write-Optimized Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of the log-structured key-value stores, represented by Google’s BigTable, HBase and Cassandra; these systems optimize write performance by adopting a log-structured merge design. While providing keybased access methods based on a Put/Get interf...
متن کاملLightweight Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of log-structured key-value stores, represented by Google’s BigTable [1], Apache HBase [2] and Cassandra [3]. While providing key-based data access with a Put/Get interface, these key-value stores do not support valuebased access methods, which...
متن کاملConcurrent Log-Structured Memory for Many-Core Key-Value Stores
Key-value stores are an important tool in managing and accessing large in-memory data sets. As many applications benefit from having as much of their working state fit into main memory, an important design of the memory management of modern key-value stores is the use of log-structured approaches, enabling efficient use of the memory capacity, by compacting objects to avoid fragmented states. H...
متن کاملLightweight Indexing of Observational Data in Log-Structured Storage
Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive domain. Log-structured storage is capable of providing high writ...
متن کاملExploring Heavy Tails Pareto and Generalized Pareto Distributions
This vignette is designed to give a short overview about Pareto Distributions and Generalized Pareto Distributions (GPD). We will work with the SPC.we data of our quantmod vignette. Therefore we have to reproduce the SPC.we data in exactly the same way as described the quantmod vignette. In financial data analysis stock indices as the S&P 500 index are typically analyzed by using the returns of...
متن کامل